Orthographic Structuring of Human Speech and Texts: Linguistic Application of Recurrence Quantification Analysis

نویسندگان

  • Franco Orsucci
  • Kimberly Walter
  • Alessandro Giuliani
  • Charles L. Webber
  • Joseph P. Zbilut
چکیده

A methodology based upon recurrence quantification analysis is proposed for the study of orthographic structure of written texts. Five different orthographic data sets (20th century Italian poems, 20th century American poems, contemporary Swedish poems with their corresponding Italian translations, Italian speech samples, and American speech samples) were subjected to recurrence quantification analysis, a procedure which has been found to be diagnostically useful in the quantitative assessment of ordered series in fields such as physics, molecular dynamics, physiology, and general signal processing. Recurrence quantification was developed from recurrence plots as applied to the analysis of nonlinear, complex systems in the physical sciences, and is based on the computation of a distance matrix of the elements of an ordered series (in this case the letters consituting selected speech and poetic texts). From a strictly mathematical view, the results show the possibility of demonstrating invariance between different language exemplars despite the apparent low-level of coding (orthography). Comparison with the actual texts confirms the ability of the method to reveal recurrent structures, and their complexity. Using poems as a reference standard for judging speech complexity, the technique exhibits language independence, order dependence and freedom from pure statistical characteristics of studied sequences, as well as consistency with easily identifiable texts. Such studies may provide phenomenological markers of hidden structure as coded by the purely orthographic level. Institute for Complexity Studies,American University, Rome, Italy TCE Laboratory,Istituto Superiore di Sanita, V.le Regina Elena 299 Rome, Italy Department of Physiology, Loyola University, 2160 S. First Ave., Maywood, IL 60153 USA Department of Molecular Biophysics and Physiology, Rush University, 1653 W. Congress, Chicago, IL 60612 USA

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Part of Speech Induction and Morphological Induction

Linguistic information is useful in natural language processing, information retrieval and a multitude of sub-tasks involving language analysis. Two types of linguistic information in all languages are part of speech and morphology. Part of speech information reflects syntactic structure and can assist in tasks such as speech recognition, machine translation and word sense disambiguation. Morph...

متن کامل

Language Features of Russian Texts of Engineering Discourse

The Article is devoted to the applied problem of identifying the linguistic features of engineering texts. The study of Russian-language texts of engineering discourse is usually of an applied nature, in our case, this applied research is caused by the need to teach foreigners who receive professional engineering education in Russia and in Russian language. The object of the research is the Rus...

متن کامل

Combinatorics & Synchronization in Natural Semiotics

In this study the derivation of an objective metrics to appreciate the degree of structuring of written and spoken texts is presented. The proposed metrics is based on the scoring of recurrences inside a text by means of the application of Recurrence Quantification Analysis (RQA), a non linear technique widely used in other fields of sciences. The adopted approach allowed us to create a ranking...

متن کامل

CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech

This paper describes speech data recording, processing and annotation of a new speech corpus CoRuSS (Corpus of Russian Spontaneous Speech), which is based on connected communicative speech recorded from 60 native Russian male and female speakers of different age groups (from 16 to 77). Some Russian speech corpora available at the moment contain plain orthographic texts and provide some kind of ...

متن کامل

Dynamic characterization and predictability analysis of wind speed and wind power time series in Spain wind farm

The renewable energy resources such as wind power have recently attracted more researchers’ attention. It is mainly due to the aggressive energy consumption, high pollution and cost of fossil fuels. In this era, the future fluctuations of these time series should be predicted to increase the reliability of the power network. In this paper, the dynamic characteristics and short-term predictabili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9712010  شماره 

صفحات  -

تاریخ انتشار 1997